Skip to content

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Aug 27, 2025

Split off of #8845

Reads records on boot, validates access.

Depends on queries from #8932

Fixes #8501

@smklein smklein marked this pull request as draft August 27, 2025 22:29
@smklein smklein force-pushed the db_metadata_nexus_records branch from fd96652 to d458639 Compare August 27, 2025 22:45
@smklein smklein force-pushed the db_metadata_nexus_records_usage branch from 3bbe988 to d7b63e3 Compare August 27, 2025 22:45
@smklein smklein force-pushed the db_metadata_nexus_records branch from d458639 to 6e20a24 Compare August 27, 2025 23:14
@smklein smklein changed the base branch from db_metadata_nexus_records to db_metadata_nexus_queries August 27, 2025 23:17
@smklein smklein changed the title (2/N) Read database access records on boot (3/N) Read database access records on boot Aug 27, 2025
@smklein smklein force-pushed the db_metadata_nexus_records_usage branch from d7b63e3 to 1f66310 Compare August 27, 2025 23:17
@smklein smklein force-pushed the db_metadata_nexus_queries branch from e301683 to b3e696e Compare August 27, 2025 23:33
@smklein smklein force-pushed the db_metadata_nexus_records_usage branch from 1f66310 to 60d389f Compare August 27, 2025 23:33
@smklein smklein changed the title (3/N) Read database access records on boot (4/N) Read database access records on boot Aug 27, 2025
@smklein smklein changed the base branch from db_metadata_nexus_queries to db_metadata_nexus_handoff August 27, 2025 23:34
@smklein smklein force-pushed the db_metadata_nexus_records_usage branch from 60d389f to c66f6ce Compare August 28, 2025 22:53
@smklein smklein force-pushed the db_metadata_nexus_handoff branch from c3be777 to c289efd Compare August 28, 2025 22:53
@smklein smklein force-pushed the db_metadata_nexus_records_usage branch from c66f6ce to f1235b5 Compare August 28, 2025 23:03
@smklein smklein changed the title (4/N) Read database access records on boot (5/N) Read database access records on boot Aug 28, 2025
@smklein smklein marked this pull request as ready for review August 28, 2025 23:13
@smklein smklein force-pushed the db_metadata_nexus_handoff branch from 7d2b6e1 to 6afc8af Compare August 29, 2025 01:26
@smklein smklein force-pushed the db_metadata_nexus_records_usage branch from f1235b5 to 68a155c Compare August 29, 2025 01:26
@smklein smklein force-pushed the db_metadata_nexus_handoff branch from 6afc8af to 8d9782a Compare August 29, 2025 15:48
@smklein smklein force-pushed the db_metadata_nexus_records_usage branch from 68a155c to 0866165 Compare August 29, 2025 15:48
@smklein smklein force-pushed the db_metadata_nexus_handoff branch from 8d9782a to 939eaf9 Compare August 29, 2025 21:05
@smklein smklein force-pushed the db_metadata_nexus_records_usage branch from 0866165 to 9db5b65 Compare August 29, 2025 21:05
@smklein smklein force-pushed the db_metadata_nexus_handoff branch from 939eaf9 to ed8c3b7 Compare August 29, 2025 21:22
@smklein smklein force-pushed the db_metadata_nexus_records_usage branch 2 times, most recently from 51febae to f5aa67c Compare August 29, 2025 22:17
@smklein smklein force-pushed the db_metadata_nexus_handoff branch 2 times, most recently from a122af5 to 653a08f Compare August 30, 2025 00:43
@smklein smklein force-pushed the db_metadata_nexus_records_usage branch from f5aa67c to 5ba6210 Compare August 30, 2025 00:43
@smklein smklein force-pushed the db_metadata_nexus_handoff branch from 653a08f to c329ca8 Compare August 30, 2025 00:51
@smklein smklein force-pushed the db_metadata_nexus_records_usage branch from 5ba6210 to 79a12dd Compare August 30, 2025 00:51
@smklein smklein force-pushed the db_metadata_nexus_handoff branch from c329ca8 to ebaa6a2 Compare September 2, 2025 16:25
@smklein smklein force-pushed the db_metadata_nexus_records_usage branch from 79a12dd to 66e720e Compare September 2, 2025 16:25
smklein added a commit that referenced this pull request Sep 2, 2025
Split off of #8845

Adds and tests queries which will be used in integration (reading on
boot): #8925

Does not actually flip Nexus to use these records yet.

Depends on #8924

Next part of #8501: Adding queries for these records
@smklein smklein force-pushed the db_metadata_nexus_handoff branch from ebaa6a2 to 38a265b Compare September 2, 2025 18:46
@smklein smklein force-pushed the db_metadata_nexus_records_usage branch from 66e720e to 3e61a97 Compare September 2, 2025 18:46
DatastoreSetupAction::NeedsHandoff { nexus_id } => {
info!(log, "Datastore is awaiting handoff");

datastore.attempt_handoff(*nexus_id).await.map_err(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is implemented in #8932

"Could not handoff to new nexus";
err
);
BackoffError::transient(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all the error cases here transient? Looking at 8932, maybe NexusInWrongState is a permanent error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even in this case, I do want us to retry.

Right now, it's actually not possible for us to return from the NexusInWrongState case:

  • It's returned from a transaction that checks "all nexus state records are either quiesced or not_yet"
  • ... and it's returned when our record specifically != not_yet, therefore it must be quiesced.
  • ... but if our record is quiesced, the previous call to check_schema_and_access would have returned Refuse, rather than NeedsHandoff
  • ... further, the transition from active to quiesced shouldn't be racy (without operator intervention) because each Nexus should be responsible for performing this transition for itself.

So, TL;DR:

  • This check is currently defensive, and not possible without someone poking db records manually
  • When we retry, we retry calling check_schema_and_access again, before re-trying attempt_handoff
  • If we do that, while also being quiesced, we'll converge to "locked out of the db"
  • ... BUT ALSO in the future, or in the face of weird concurrent events, IMO the decision to "restart all checks" doesn't seem like a bad choice.

I think that seeing a weird state in one of these branches, and choosing to "re-evaluate the world again from scratch" seems like a reasonable choice. check_schema_and_access should be able to determine a reasonable next step, and it only throws errors if we cannot access the database (which I consider to be transient).

warn!(
log,
"Cannot check schema version / Nexus access";
"error" => InlineErrorChain::new(err.as_ref()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit - InlineErrorChain implements slog::KV and provides the "error" key, so this can be shortened to

Suggested change
"error" => InlineErrorChain::new(err.as_ref()),
InlineErrorChain::new(err.as_ref()),

(Same nit below, but also I don't feel strongly about this so if you prefer the explicit key, feel free to leave them.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patched in bbb204c

@@ -108,11 +110,40 @@ async fn main_impl() -> anyhow::Result<()> {
}
Cmd::Upgrade { version } => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily on this PR - should we do more to discourage the use of this tool? If something goes wrong with handoffs we'll needed it, but in general we shouldn't run this anymore (once the whole stack of work lands), right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally agreed - I think the main usage of this is from https://github.com/oxidecomputer/meta/blob/master/engineering/dogfood/overview.adoc , which we can and should patch once we're ready to go to the full "online update" world for Nexus.

smklein added a commit that referenced this pull request Sep 3, 2025
Split off of #8845

Adds and tests handoff-related queries which will be used in integration
(reading on boot): #8925

Does not actually flip Nexus to use these records yet.

Depends on #8931

Next part of #8501: Adding queries for these records
Base automatically changed from db_metadata_nexus_handoff to main September 3, 2025 18:56
@davepacheco
Copy link
Collaborator

The changes here look good. I want to check my understanding of the risk and impact of this.

During the next MUPdate from, say, R16, in chronological order:

  • Nexus instances will come up, find an old schema, then find that they have implicit access (because the schema is too old), get to DataStoreAction::Update, that will fail because they're not configured with the schema versions, and repeat the retry loop.
  • The schema updater tool will also find an old schema, not bother to do an access check because it doesn't care (giving it implicit access), and also wind up with DataStoreAction::Update. But it'll have the schema versions configured so it will do the schema update.
  • The existing Nexus instances, on their next loop, will wind up finding a current schema and explicit access because the schema update will have inserted records for them into the table.
  • On all subsequent restarts of Nexus after that point, they'll go through that same path.

If there's a subsequent blueprint execution, it won't change the records (assuming no zones have been added or expunged). So again, for all subsequent restarts of Nexus after that point, they'll go through that same flow.

If we MUPdate again after that:

  • Nexus instances will come up, find an old schema, but still do an access check because the schema is not too old to do that. They will find explicit access. The result is the same DataStoreAction::Update as above and they'll still loop again.
  • The schema updater will see the old schema, not bother with an access check, get DataStoreAction::Update, and then update the schema.
  • As above, the existing instances on their next loop will find the current schema and explicit access.
  • As before, subsequent restarts and blueprint execution change nothing.

Subsequent MUPdates are like that one until we change the software further (e.g., to have Nexus do schema updates).

Is that all correct? What, if any of that, do you think we want to test manually before landing this?

@smklein
Copy link
Collaborator Author

smklein commented Sep 4, 2025

  • Nexus instances will come up, find an old schema, then find that they have implicit access (because the schema is too old), get to DataStoreAction::Update, that will fail because they're not configured with the schema versions, and repeat the retry loop.

Correct, a Nexus which upgrades and has an old schema will run check_schema_and_access, and the implicit access will be granted because the schema in the DB is older than DB_METADATA_NEXUS_SCHEMA_VERSION. This results in SchemaStatus::OlderThanDesiredSkipAccessCheck + NexusAccess::HasImplicitAccess, which creates DatastoreSetupAction::Update.

Without #8912 , this will fail, and hit the retry loop.

  • The schema updater tool will also find an old schema, not bother to do an access check because it doesn't care (giving it implicit access), and also wind up with DataStoreAction::Update. But it'll have the schema versions configured so it will do the schema update.

Yup.

  • The existing Nexus instances, on their next loop, will wind up finding a current schema and explicit access because the schema update will have inserted records for them into the table.

Yeah, populating these records should come from schema/crdb/populate-db-metadata-nexus/up04.sql.

  • On all subsequent restarts of Nexus after that point, they'll go through that same path.

If there's a subsequent blueprint execution, it won't change the records (assuming no zones have been added or expunged). So again, for all subsequent restarts of Nexus after that point, they'll go through that same flow.

Even if those zones are added or expunged, the records should be updated, as a part of #8924

But if the set of zones stay the same, these records should stay the same too.

If we MUPdate again after that:

  • Nexus instances will come up, find an old schema, but still do an access check because the schema is not too old to do that. They will find explicit access. The result is the same DataStoreAction::Update as above and they'll still loop again.
  • The schema updater will see the old schema, not bother with an access check, get DataStoreAction::Update, and then update the schema.
  • As above, the existing instances on their next loop will find the current schema and explicit access.
  • As before, subsequent restarts and blueprint execution change nothing.

Subsequent MUPdates are like that one until we change the software further (e.g., to have Nexus do schema updates).

Yup, this matches my understanding.

Is that all correct? What, if any of that, do you think we want to test manually before landing this?

WDYT about a deployment of:

  • Have old system
  • Mupdate to this version
  • Check that "the old path for schema update still works", as expected?

(I can do that manually, or we could merge this and do it as a part of the dogfood rollout - either way would be fine with me)

@davepacheco
Copy link
Collaborator

Is that all correct? What, if any of that, do you think we want to test manually before landing this?
WDYT about a deployment of:

* Have old system

* Mupdate to this version

* Check that "the old path for schema update still works", as expected?

(I can do that manually, or we could merge this and do it as a part of the dogfood rollout - either way would be fine with me)

By "the old path for schema update still works", you mean basically doing this?

  • you do the mupdate
  • you verify that the new Nexus instances are in a holding pattern
  • you do the schema update
  • you verify that the new Nexus instances are up again
  • maybe: restart them and make sure they come up

That seems pretty good. If it's easy enough to try before landing this, that seems worthwhile.

@smklein
Copy link
Collaborator Author

smklein commented Sep 4, 2025

I'm using rkadm on berlin to test this right now, with:

$ /opt/rackletteadm/bin/rkadm --toml /opt/rackletteadm/configs/berlin/berlin-rkadm.toml run \
  --commit-a-install 032c5569fda856de03de0a5db1f4617351200c01 \
  --commit-b-upgrade fed7aa07df18d4f7d42fcf27194f1db61b7edf01 \
  --tuf-repo-output-dir-a-install /data/local/env/berlin/sean-a \
  --tuf-repo-output-dir-b-upgrade /data/local/env/berlin/sean-b

This upgrades from main (@ 032c556) to this PR (fed7aa0)

Will report back when that update is installed. Should be able to reboot a Nexus after that whole process too.

EDIT: Picked the wrong target commit; merging and trying again...

@smklein
Copy link
Collaborator Author

smklein commented Sep 5, 2025

I'm using rkadm on berlin to test this right now, with:

$ /opt/rackletteadm/bin/rkadm --toml /opt/rackletteadm/configs/berlin/berlin-rkadm.toml run \
  --commit-a-install 032c5569fda856de03de0a5db1f4617351200c01 \
  --commit-b-upgrade fed7aa07df18d4f7d42fcf27194f1db61b7edf01 \
  --tuf-repo-output-dir-a-install /data/local/env/berlin/sean-a \
  --tuf-repo-output-dir-b-upgrade /data/local/env/berlin/sean-b

This upgrades from main (@ 032c556) to this PR (fed7aa0)

Will report back when that update is installed. Should be able to reboot a Nexus after that whole process too.

EDIT: Picked the wrong target commit; merging and trying again...

Okay, took me a couple times, because typing the right commit hashes is apparently a daunting task. Getting the wrong commit certainly showed me the schema update is running, and still doesn't want to downgrade!

From 032c5569fda856de03de0a5db1f4617351200c01 to fed7aa07df18d4f7d42fcf27194f1db61b7edf01:

  • The upgrade was successful (going from 188.0.0 -> 188.0.0), and booted with this PR
  • Using the crdb shell, I can see the db_metadata_nexus records for three Nexuses. They all look active.
  • I also see one blueprint, which has nexus_generation = 1.
  • If I restart Nexus, it reads the database, and comes back up online.

That test covers "main -> the addition of this PR".

I'm also going to upgrade using:

$ /opt/rackletteadm/bin/rkadm --toml /opt/rackletteadm/configs/berlin/berlin-rkadm.toml run \
  --commit-a-install 3be1d57fadcca52b82039df9d08ebdfed49b2868 \
  --commit-b-upgrade fed7aa07df18d4f7d42fcf27194f1db61b7edf01 \
  --tuf-repo-output-dir-a-install /data/local/env/berlin/sean-a \
  --tuf-repo-output-dir-b-upgrade /data/local/env/berlin/sean-b

To cover the case where "actually do a schema update", which will come from "a few commits ago -> the addition of this PR".

Note that 3be1d57 was before the chain of db_metadata schema changes merged.

@smklein
Copy link
Collaborator Author

smklein commented Sep 5, 2025

Okay running:

$ /opt/rackletteadm/bin/rkadm --toml /opt/rackletteadm/configs/berlin/berlin-rkadm.toml run \
  --commit-a-install 3be1d57fadcca52b82039df9d08ebdfed49b2868 \
  --commit-b-upgrade fed7aa07df18d4f7d42fcf27194f1db61b7edf01 \
  --tuf-repo-output-dir-a-install /data/local/env/berlin/sean-a \
  --tuf-repo-output-dir-b-upgrade /data/local/env/berlin/sean-b

Also succeeds.

  • I see that this performs a schema upgrade with the schema-updater binary, just as we would do through a MUPdate process. Nexus comes up normally.
  • I also see the same data in cockroachDB (blueprint has a nexus_generation value, and the db_metadata_nexus records exist)
  • Restarting Nexus causes it to come back online.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

schema update: orchestration of the handoff from old to new Nexus
3 participants